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Abstract. We study whether, when restricted to using polylogarithmic memory and polylogarithmic 
passes, we can achieve qualitatively better data compression with multiple read/write streams than 
■ we can with only one. We first show how we can achieve universal compression using only one pass 

over one stream. We then show that one stream is not sufficient for us to achieve good grammar- 
based compression. Finally, we show that two streams are necessary and sufficient for us to achieve 
entropy-only bounds. 
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1 Introduction 

Massive datasets seem to expand to fill the space available and, in situations where they no 

in ■ longer fit in memory and must be stored on disk, we may need new models and algorithms. 

{Sj I Grohe and Schweikardt [21] introduced read/write streams to model situations in which we 

want to process data using mainly sequential accesses to one or more disks. As the name 

suggests, this model is like the streaming model (see, e.g., [28J) but, as is reasonable with 

datasets stored on disk, it allows us to make multiple passes over the data, change them and 

even use multiple streams (i.e., disks). As Grohe and Schweikardt pointed out, sequential disk 
• i-^ 

^ | accesses are much faster than random accesses — potentially bypassing the von Neumann 
5_i , 

bottleneck — and using several disks in parallel can greatly reduce the amount of memory 
and the number of accesses needed. For example, when sorting, we need the product of the 
memory and accesses to be at least linear when we use one disk [2"Tf20] but only polylogarith- 
mic when we use two [9|2T] . Similar bounds have been proven for a number of other problems, 
such as checking set disjointness or equality; we refer readers to Schweikardt's survey [31] of 
upper and lower bounds with one or more read/write streams, Heinrich and Schweikardt's 
paper [23] relating read/write streams to classic complexity theory, and Beame and Huynh's 
paper pE] on the value of multiple read/write streams for approximating frequency moments. 

Since sorting is an important operation in some of the most powerful data compression 
algorithms, and compression is an important operation for reducing massive datasets to 



a more manageable size, we wondered whether extra streams could also help us achieve 
better compression. In this paper we consider the problem of compressing a string s of n 
characters over an alphabet of size a when we are restricted to using log° (1) n bits of memory 
and log *- 1 -* n passes over the data. Throughout, we write log to mean log 2 unless otherwise 
stated. In Section [2j we show how we can achieve universal compression using only one pass 
over one stream. Our approach is to break the string into blocks and compress each block 
separately, similar to what is done in practice to compress large files. Although this may 
not usually significantly worsen the compression itself, it may stop us from then building 
a fast compressed index (see [29J for a survey) unless we somehow combine the indexes 
for the blocks, or clustering by compression [TT] (since concatenating files should not help 
us compress them better if we then break them into pieces again). In Section [3] we use a 
vaguely automata-theoretic argument to show one stream is not sufficient for us to achieve 
good grammar-based compression. Of course, by 'good' we mean here something stronger 
than universal compression: we want to build a context-free grammar that generates s and 
only s and whose size is nearly minimum. In a paper with Gawrychowski [T7] we showed 
that with constant memory and logarithmic passes over a constant number of streams, we 
can build a grammar whose size is at most quadratic in the minimum. Finally, in Section H] 
we show that two streams are necessary and sufficient for us to achieve entropy-only bounds. 
Along the way, we show we need two streams to find strings' minimum periods or compute 
the Burrows- Wheeler Transform. As far as we know, this is the first paper on compression 
with read/write streams, and among the first papers on compression in any streaming model; 
we hope the techniques we have used will prove to be of independent interest. 

2 Universal compression 

An algorithm is called universal with respect to a class of sources if, when a string is drawn 
from any of those sources, the algorithm's redundancy per character approaches with 
probability 1 as the length of the string grows. The class most often considered, and which 
we consider in this section, is that of stationary, ergodic Markov sources (see, e.g., [12] ). Since 
the fcth-order empirical entropy Hk(s) of s is the minimum self-information per character of 
s with respect to a /cth-order Markov source (see |33j), an algorithm is universal if it stores 
any string s in nHk(s) + o(n) bits for any fixed a and k. The fcth-order empirical entropy of 
s is also our expected uncertainty about a randomly chosen character of s when given the k 
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preceding characters. Specifically, 

f(l/n)EaOCc(a )S )log^iffc = 0, 
H k {s) = < 

[ (V n ) E|w|=fe |w s |#o(w s ) otherwise, 

where occ(a, s) is the number of times character a occurs in s, and w s is the concatenation 
of those characters immediately following occurrences of fc-tuple w in s. 

In a previous paper [19] we showed how to modify the well-known LZ77 compression 
algorithm [22] to use sublinear memory while still storing s in nH k (s) + (9(n log logn/ logn) 
bits for any fixed a and k. Our algorithm uses nearly linear memory and so does not fit into 
the model we consider in this paper, but we mention it here because it fits into some other 
streaming models (see, e.g., [28]) and, as far as we know, was the first compression algorithm 
to do so. In the same paper we proved several lower bounds using ideas that eventually led 
to our lower bounds in Sections [3] and 0] of this paper. 

Theorem 1 (Gagie and Manzini, 2007). We can achieve universal compression using 
one pass over one stream and 0(n/ log 2 n) bits of memory. 

To achieve universal compression with only polylogarithmic memory, we use a algorithm 
due to Gupta, Grossi and Vitter [22] ■ Although they designed it for the RAM model, we can 
easily turn it into a streaming algorithm by processing s in small blocks and compressing 
each block separately. 

Theorem 2 (Gupta, Grossi and Vitter, 2008). In the RAM model, we can store any 
string s in nH k (s) + O(o k logn) bits, for all k simultaneously, using 0(n) time. 

Corollary 1. We can achieve universal compression using one pass over one stream and 
0(\og 1+e n) bits of memory. 

Proof. We process s in blocks of log 6 n characters, as follows: we read each block into memory, 
apply Theorem [21 to it, output the result, empty the memory, and move on to the next block. 
(If n is not given in advance, we increase the block size as we read more characters.) Since 
Gupta, Grossi and Vitter's algorithm uses 0(n) time in the RAM model, it uses 0(n\ogn) 
bits of memory and we use 0(\og 1+e n) bits of memory. If the blocks are Si, . . . , s&, then we 
store all of them in a total of 

b 

(\si\H k ( Si ) + 0(a h log logn)) < nH k (s) + 0(a k nlog\ogn/log e n) 

i=i 
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bits for all k simultaneously. Therefore, for any fixed a and k, we store s in nH^s) + o(n) 



A bound of nHk(s) + O [a k n log log n / log e n J bits is not very meaningful when k is not 
fixed and grows as fast as log log n, because the second term is u(n). Notice, however, that 
Gupta et al. 's bound of nHk(s) + O{o k log nj bits is also not very meaningful when k > log n, 
for the same reason. As we will see in Section HJ it is possible for s to be fairly incompressible 
but still to have Hk(s) = for k > logn. It follows that, although we can prove bounds that 
hold for all k simultaneously, those bounds cannot guarantee good compression in terms of 
Hk(s) when k > logn. 



By using larger blocks — and, thus, more memory — we can reduce the Oya k n log log n/ log 6 n 



redundancy term in our analysis, allowing k to grow faster than log log n while still having a 
meaningful bound. Specifically, if we process s in blocks of c characters, then we use O(clogn) 
bits of memory and achieve a redundancy term of 0(a k n log c / cj, allowing k to grow nearly 
as fast as log CT c while still having a meaningful bound. We will show later, in Theorem [151 
that this tradeoff is nearly optimal: if we use m bits of memory and p passes over one stream 
and our redundancy term is O(^o~ k r^, then mpr = Q(n/f(n)) for any function / that in- 
creases without bound. It is not clear to us, however, whether we can modify Corollary [T] to 
take advantage of multiple passes. 

Open Problem 1 With multiple passes over one stream, can we achieve better bounds on 
the memory and redundancy than we can with one pass? 

3 Grammar-based compression 

Charikar et al. [8] and Rytter [32] independently showed how to build a nearly minimal 
context-free grammar APPROX that generates s and only s. Specifically, their algorithms 
yield grammars that are an 0(logn) factor larger than the smallest such grammar OPT, 
which has size i?(logn) bits. 

Theorem 3 (Charikar et al., 2005; Rytter, 2003). In the RAM model, we can approx- 
imate the smallest grammar with |APPR0X| = 0(|OPT| 2 ) using 0(n) time. 

In this section we prove that, if we use only one stream, then in general our approximation 
must be superpolynomially larger than the smallest grammar. Our idea is to show that 
periodic strings whose periods are asymptotically slightly larger than the product of the 
memory and passes, can be encoded as small grammars but, in general, cannot be compressed 



bits. 
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well by algorithms that use only one stream. Our argument is based on the following two 
lemmas. 

Lemma 1. If s has period I, then the size of the smallest grammar for that string is 
0(£ log a + log n log log n) bits. 

Proof. Let t be the repeated substring and t' be the proper prefix of t such that s = v- n "H' . 
We can encode a unary string X^l^ as a grammar G\ with O(logn) productions of total 
size C(lognloglogra) bits. We can also encode t and t' as grammars G 2 and G 3 with 0(£) 
productions of total size 0(£\oga) bits. Suppose Si, S 2 and S 3 are the start symbols of 
G\, G2 and G3, respectively. By combining those grammars and adding the productions 
S Si S3 and X — > S 2 , we obtain a grammar with 0(£ + logn) productions of total size 
0(£ log a + log n log log n) bits that maps So to s. □ 

Lemma 2. Consider a lossless compression algorithm that uses only one stream, and a 
machine performing that algorithm. We can compute any substring from 

— its length; 

— for each pass, the machine's memory configurations when it reaches and leaves the part 
of the stream that initially holds that substring; 

— all the output the machine produces while over that part. 

Proof. Let t be the substring and assume, for the sake of a contradiction, that there exists 
another substring t' with the same length that takes the machine between the same con- 
figurations while producing the same output. Then we can substitute t' for t in s without 
changing the machine's complete output, contrary to our specification that the compression 
be lossless. □ 

Lemma [2] implies that, for any substring, the size of the output the machine produces 
while over the part of the stream that initially holds that substring, plus twice the product of 
the memory and passes (i.e., the number of bits needed to store the memory configurations), 
must be at least that substring's complexity. Therefore, if a substring is not compressible by 
more than a constant factor (as is the case for most strings) and asymptotically larger than 
the product of the memory and passes, then the size of the output for that substring must 
be at least proportional to the substring's length. In other words, the algorithm cannot take 
full advantage of similarities between substrings to achieve better compression. In particular, 
if s is periodic with a period that is asymptotically slightly larger than the product of the 
memory and passes, and s's repeated substring is not compressible by more than a constant 
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factor, then the algorithm's complete output must be i?(n) bits. By Lemma HJ however, the 
size of the smallest grammar that generates s and only s is bounded in terms of the period. 

Theorem 4. With one stream, we cannot approximate the smallest grammar with | APPROX| < 
|OPT|°W. 

Proof. Suppose an algorithm uses only one stream, m bits of memory and p passes to com- 
press s, with mp = log *- 1 ** n, and consider a machine performing that algorithm. Further- 
more, suppose s is binary and periodic with period mplogn and its repeated substring 
t is not compressible by more than a constant factor. Lemma [2] implies that the ma- 
chine's output while over a part of the stream that initially holds a copy of t, must be 
f](mplogn — mp) = f2(mplogn). Therefore, the machine's complete output must be J?(n) 
bits. By Lemma HI however, the size of the smallest grammar that generates s and only s is 
0(mp\ogn + log n log log n) C \og°^ n bits. Since n = log^^ n, the algorithm's complete 
output is superpolynomially larger than the smallest grammar. □ 

As an aside, we note that a symmetric argument shows that, with only one stream, in 
general we cannot decode a string encoded as a small grammar. To see why, instead of 
considering a part of the stream that initially holds a copy of the repeated substring t, 
consider a part that is initially blank and eventually holds a copy of t. (Since s is periodic 
and thus very compressible, its encoding takes up only a fraction of the space it eventually 
occupies when decompressed; without loss of generality, we can assume the rest is blank.) 
An argument similar to the proof of Lemma [2] shows we can compute t from the machine's 
memory configurations when it reaches and leaves that part, so the product of the memory 
and passes must again be greater than or equal to t's complexity. 

Theorem 5. With one stream, we cannot decompress strings encoded as small grammars. 

Theorem H] also has the following corollary, which may be of independent interest. 

Corollary 2. With one stream, we cannot find strings' minimum periods. 

Proof. Consider the proof of Theorem HI Notice that, if we could find s's minimum period, 
then we could store s in log°^ n bits by writing n and one copy of its repeated substring t. 
It follows that we cannot find strings' minimum periods. □ 

Corollary [2] may at first seem to contradict work by Ergiin, Muthukrishnan and Sahi- 
nalp [13], who gave streaming algorithms for determining approximate periodicity. Whereas 
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we are concerned with strings which are truly periodic, however, they were concerned with 
strings in which the copies of the repeated substring can differ to some extent. To see why 
this is an important difference, consider the simple case of checking whether s has period n/2 
(i.e., whether or not it is a square). Suppose we know the two halves of s are either identical 
or differ in exactly one position, and we want to determine whether s truly has period n/2; 
then we must compare each corresponding pair of characters and, by a crossing- sequences 
argument (see, e.g., [27] for details of a similar argument), this takes Q{n/m) passes. Now 
suppose we care only whether the two halves of s match only in nearly all positions; then we 
need compare only a few randomly chosen pairs to decide correctly with high probability. 

Theorem 6. With one stream, we cannot even check strings' minimum periods. 

In the conference version of this paper we left as an open problem proving whether or not 
multiple streams are useful for grammar-based compression. As we noted in the introduction, 
in a subsequent paper with Gawrychowski [TTJ we showed that with constant memory and 
logarithmic passes over a constant number of streams, we can approximate the smallest 
grammar with |APPROX| = (9(|OPT| 2 ), answering our question affirmatively 

4 Entropy-only bounds 

Kosaraju and Manzini |25j pointed out that proving an algorithm universal does not nec- 
essarily tell us much about how it behaves on low-entropy strings. In other words, showing 
that an algorithm encodes s in nHk(s) + o(n) bits is not very informative when nH^{s) = 
o(n). For example, although the well-known LZ78 compression algorithm [36] is universal, 
|LZ78(l n )| = fl(\/n) while nHo(l n ) = 0. To analyze how algorithms perform on low-entropy 
strings, we would like to get rid of the o(n) term and prove bounds that depend only on 
nHk(s). Unfortunately, this is impossible since, as the example above shows, even nH Q (s) 
can be for arbitrarily long strings. 

It is not hard to show that only unary strings have H (s) = 0. For k > 1, recall that 
Hk(s) = (1/n) J2\ w \=k \w s \Hq(w s ). Therefore, Hk(s) = if and only if each distinct fc-tuple 
w in s is always followed by the same distinct character. This is because, if a w is always 
followed by the same distinct character, then w s is unary, H (w s ) = and w contributes 
nothing to the sum in the formula. Manzini [26J defined the fcth-order modified empirical 
entropy H%(s) such that each context w contributes at least [log |w s |J +1 to the sum. Because 
modified empirical entropy is more complicated than empirical entropy — e.g., it allows for 
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variable-length contexts — we refer readers to Manzini's paper for the full definition. In our 
proofs in this paper, we use only the fact that 



n 



H k (s) < nH;(s) < nH k (s) + 0(a k \ogn) . 



Manzini showed that, for some algorithms and all k simultaneously, it is possible to bound 
the encoding's length in terms of only nH£(s) and a constant gy. that depends only on a and k; 
he called such bounds 'entropy-only'. In particular, he showed that an algorithm based on the 
Burrows- Wheeler Transform (BWT) [7] stores any string s in at most (5+e)nH£(s)+logn+gk 
bits for all k simultaneously (since nHl(s) > log(n — k), we could remove the logn term by 
adding 1 to the coefficient 5 + e) . 

Theorem 7 (Manzini, 2001). Using the BWT, move-to-front coding, run-length coding 
and arithmetic coding, we can achieve an entropy-only bound. 

The BWT sorts the characters in a string into the lexicographical order of the suffixes 
that immediately follow them. When using the BWT for compression, it is customary to 
append a special character $ that is lexicographically less than any in the alphabet. For a 
more thorough description of the BWT, we again refer readers to Manzini's paper. In this 
section we first show how we can compute and invert the BWT with two streams and, thus, 
achieve entropy-only bounds. We then show that we cannot achieve entropy-only bounds 
with only one stream. In other words, two streams are necessary and sufficient for us to 
achieve entropy-only bounds. 

One of the most common ways to compute the BWT is by building a suffix array. In his 
PhD thesis, Ruhl introduced the StreamSort model [2H2], which is similar to the read/write 
streams model with one stream, except that it has an extra primitive that sorts the stream 
in one pass. Among other things, he showed how to build a suffix array efficiently in this 
model. 

Theorem 8 (Ruhl, 2003). In the StreamSort model, we can build a suffix array using 
0(logn) bits of memory and 0(logn) passes. 

Corollary 3. With two streams, we can compute the BWT using O(logn) bits of memory 



Proof. We can compute the BWT in the StreamSort model by appending $ to s, building a 
suffix array, and replacing each value % in the array by the {% — l)st character in s (replacing 
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either or 1 by $, depending on where we start counting). This takes O(logn) bits of memory 
and 0(\ogn) passes. Since we can sort with two streams using 0(logn) bits memory and 
0(\ogn) passes (see, e.g., [31]), it follows that we can compute the BWT using C(logn) bits 



We note as an aside that, once we have the suffix array for a periodic string, we can 
easily find its minimum period. To see why, suppose s has minimum period £, and consider 
the suffix u of s that starts in position £ + 1. The longest common prefix of s and u has 
length n — £, which is maximum; if another suffix v shared a longer common prefix with s, 
then s would have period n — \v\ < £. It follows that, if the first position in the suffix array 
contains i, then the (£ + l)st position contains i — 1 (assuming s terminates with $, so u is 
lexicographically less than s). With two streams we can easily find the position £ + 1 that 
contains i — 1 and then check that s is indeed periodic with period £. 

Corollary 4. With two streams, we can compute a string's minimum period using 0(logn) 



Now suppose we are given a permutation it on n + 1 elements as a list vr(l), . . . , n(n + 1), 
and asked to rank it, i.e., to compute the list 7r°(l), . . . , vr n (l). This problem is a special case 
of list ranking (see, e.g., [3]) and has a surprisingly long history. For example, Knuth [2U 
Solution 24] described an algorithm, which he attributed to Hardy, for ranking a permutation 
with two tapes. More recently, Bird and Mu [5] showed how to invert the BWT by ranking 
a permutation. Therefore, reinterpreting Hardy's result in terms of the read/write streams 
model gives us the following bounds. 

Theorem 9 (Hardy, c. 1967). With two streams, we can rank a permutation using 0(\ogn) 



Corollary 5. With two streams, we can invert the BWT using 0(\ogn) bits of memory and 
0[\og 2 n) passes. 



Proof. The BWT has the property that, if a character is the ith in BWT(s), then its successor 
in s is the lexicographically ith in BWT(s) (breaking ties by order of appearance). Therefore, 
we can invert the BWT by replacing each character by its lexicographic rank, ranking the 
resulting permutation, replacing each value i by the ith character of BWT(s), and rotating 




□ 





the string until $ is at the end. This takes Oilogn) memory and Oilog 2 n) passes. 



□ 
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Since we can compute and invert move-to- front, run- length and arithmetic coding using 
O(logn) bits of memory and 0(1) passes over one stream, by combining Theorem [7] and 
Corollaries |3] and [5] we obtain the following theorem. 

Theorem 10. With two streams, we can achieve an entropy-only bound using 0(logn) bits 
of memory and o(log 2 nj passes. 

It follows from Theorem [10] and a result by Hernich and Schweikardt [23] that we can 
achieve an entropy-only bound using 0(1) bits of memory, C>(log 3 rij passes and four streams. 
It follows from their theorem below that, with more streams, we can even reduce the number 
of passes to 0(logn). 

Theorem 11 (Hernich and Schweikardt, 2008). If we can solve a problem with loga- 
rithmic work space, then we can solve it using 0(1) bits of memory and 0(logn) passes over 
0(1) streams. 

Corollary 6. With 0(1) streams, we can achieve an entropy-only bound using 0(1) bits of 
memory and 0(\ogn) passes. 

Proof. To compute the zth character of BWT(s), we find the ith lexicographically largest 
suffix. To find this suffix, we loop though all the suffixes and, for each, count how many 
other suffixes are lexicographically less. Comparing two suffixes character by character takes 
0(n 2 ) time, so we use a total of 0(n 4 ) time; it does not matter now how much time we use, 
however, just that we need only a constant number of (9(logn)-bit counters. Since we can 
compute the BWT with logarithmic work space, it follows from Theorem [TT] that we can 
compute it — and thereby achieve an entropy-only bound — with 0(1) bits of memory and 
Oilogn) passes over 0(1) streams. □ 

Although we have not been able to prove an i?(logn) lower bound on the number of 
passes needed to achieve an entropy-only bound with 0(1) streams, we have been able to 
prove such a bound for computing the BWT. Our idea is to reduce sorting to the BWT, 
since Grohe and Schweikardt [21] showed we cannot sort n numbers with o(logn) passes 
over 0(1) streams. It is trivial, of course, to reduce sorting to the BWT if the alphabet is 
large enough — e.g., linear in n — but our reduction is to the more reasonable problem of 
computing the BWT of a ternary string. 

Theorem 12. With 0(1) streams, we cannot compute the BWT using o(logn) passes. 
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Fig. 1. Examples of binary De Bruijn cycles of orders 3 and 4. 



Proof. Suppose we are given a sequence of n numbers x%, . . . ,x n , each of 2 logn bits. Grohe 
and Schweikardt showed we cannot generally sort such a sequence using o(logra) passes over 
(9(1) tapes. We now use o(logn) passes to turn x±, . . . , x n into a ternary string s such that, 
by calculating BWT(s), we sort x\,...,x n . It follows from this reduction that we cannot 
compute the BWT using o(logn) passes, either. 

With one pass, O(logn) bits of memory and two tapes, for 1 < i < n and 1 < j < 2 logn, 
we replace the jth bit Xi[j] of Xi by Xi[j] 2 X{ i j, writing 2 as a single character, Xi in 
2 log n bits, i in log n bits and j in log log n + 1 bits; the resulting string s is of length 
2n log n(3 logn + log logn + 2). The only characters followed by 2s in s are the bits at the 
beginning of replacement phrases, so the last 2nlogn characters of BWT(s) are the bits of 
Xi, . . . , x n \ moreover, since the lexicographic order of equal-length binary strings is the same 
as their numeric order, the Xi[j] bits will be arranged by the Xi values, with ties broken by 
the % values (so if Xi = xy with i < i' , then every Xi[j] comes before every and further 

ties broken by the j values; therefore, the last 2nlogn bits of the transformed string are 
Xi, . . . , x n in sorted order. □ 

To show we need at least two streams to achieve entropy-only bounds, we use De Bruijn 
cycles in a proof similar to the one for Theorem HI A a-ary De Bruijn cycle of order k is a 
cyclic sequence in which every possible fc-tuple appears exactly once. For example, Figured] 
shows binary De Bruijn cycles of orders 3 and 4. Our argument this time is based on LemmaH 
and the results below about De Bruijn cycles. We note as a historical aside that Theorem [TBI 
was first proven for the binary case in 1894 by Flye Sainte-Marie [15J, but his result was 
later forgotten; De Bruijn [6j gave a similar proof for that case in 1946, then in 1951 he and 
Van Aardenne-Ehrenfest |TJ proved the general version we state here. 

Lemma 3. If s £ d* for some binary a-ary De Bruijn cycle d of order k, then nH£(s) = 



Proof. By definition, each distinct /c-tuple is always followed by the same distinct character; 





□ 
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Theorem 13 (Van Aardenne-Ehrenfest and De Bruijn, 1951). There are (al" /cr fc j 
a-ary De Bruijn cycles of order k. 

Corollary 7. We cannot store most kth-order De Bruijn cycles in o(a k loga) bits. 

Proof. By Stirling's Formula, log (a\ crk 1 /er fe ) = 0(a k logo"). □ 

Since there are a k possible fc-tuples, fcth-order De Bruijn cycles have length a k , so Corol- 
lary [7| means that we cannot compress most De Bruijn cycles by more than a constant factor. 
Therefore, we can prove a lower bound similar to Theorem H] by supposing that s's repeated 
substring is a De Bruijn cycle, then using Lemma [3] instead of Lemma [fl 

Theorem 14. With one stream, we cannot achieve an entropy-only bound. 

Proof. As in the proof of Theorem m suppose an algorithm uses only one stream, m bits of 
memory and p passes to compress s, with mp = \og°^ n, and consider a machine per- 
forming that algorithm. This time, however, suppose s is binary and periodic with pe- 
riod mpf(ri), where f(n) = 0(logn) is a function that increases without bound; further- 
more, suppose s's repeated substring t is a /cth-order De Bruijn cycle, k = log(mp f(n)), 
that is not compressible by more than a constant factor. Lemma [2] implies that the ma- 
chine's output while over a part of the stream that initially holds a copy of t, must be 
Q{mp fin) — mp) = Q{mp f(n)). Therefore, the machine's complete output must be i?(n) 
bits. By Lemma[3l however, nHl(s) = 0(2 k \ognj = 0(mp f(n) log n) C \og°^ n. □ 

Recall that in Section [2] we asserted the following claim, which we are now ready to prove. 

Theorem 15. // we use m bits of memory and p passes over one stream and achieve 
universal compression with an redundancy term, for all k simultaneously, then 

mpr = f2(n/f(n)) for any function f that increases without bound. 

Proof. Consider the proof of Theorem [TH nHk(s) = but we must output i?(n) bits, so 
r = f2(n/a h ) = Qinjimp fin))). □ 

Notice Theorem [TH also implies a lower bound for computing the BWT: if we could 
compute the BWT with one stream then, since we can compute move-to-front, run- length 
and arithmetic coding using O(logn) bits of memory and 0(1) passes over one stream, we 
could thus achieve an entropy-only bound with one stream, contradicting Theorem O 

Corollary 8. With one stream, we cannot compute the BWT. 
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In the conference version of this paper [16] we closed with a brief discussion of three 
entropy-only bounds that we proved with Manzini [18]. Our first bound was an improved 
analysis of the BWT followed by move-to- front, run-length and arithmetic coding (which 
lowered the coefficient from 5 + e to 4.4 + e), but our other bounds (one of which had a 
coefficient of 2.69 + e) were analyses of the BWT followed by algorithms which we were not 
sure could be implemented with 0(1) streams. We now realize that, since both of these other 
algorithms can be computed with logarithmic work space, it follows from Theorem [TT] that 
they can indeed be computed with 0(1) streams. 

After having proven that we cannot compute the BWT with one stream, we promptly 
start working with Ferragina and Manzini on a practical algorithm [TJ] that does exactly 
that. However, that algorithm does not fit into the streaming models we have considered in 
this paper; in particular, the product of the internal memory and passes there is 0(n\ogn) 
bits, but we use only n bits of workspace on the disk. The existence of a practical algorithm 
for computing the BWT in external memory raises the question of whether we can query 
BWT-based compressed indexes quickly in external memory. Chien et al. [10] proved lower 
bounds for indexed pattern matching in the external-memory model, but that model allows 
does not distinguish between sequential and random access to blocks. The read/write- streams 
model is also inappropriate for analyzing the complexity of this task, since we can trivially 
use only one pass over one stream if we leave the text uncompressed and scan it all with 
a classic sequential pattern-matching algorithm. Orlandi and Venturini [30] recently showed 
how we can store a sample of the BWT that lets us estimate what parts of the full BWT we 
need to read in order to answer a query. If we modify their data structure slightly, we can 
make it recursive; i.e., with a smaller sample we can estimate what parts of the sample we 
need to read in order to estimate what parts of the full BWT we need to read. Suppose we 
store on disk a set of samples whose sizes increase exponentially, finishing with the BWT 
itself. We use each sample in turn to estimate what parts of the next sample we need to 
read, then read them into internal memory using only one pass over the next sample. This 
increases the size of the whole index only slightly and lets us answer queries by reading 
few blocks and in the order they appear on disk. We are currently working to optimize and 
implement this idea. 
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